974 resultados para COMPARATIVE GENOMICS


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Exponential growth of genomic data in the last two decades has made manual analyses impractical for all but trial studies. As genomic analyses have become more sophisticated, and move toward comparisons across large datasets, computational approaches have become essential. One of the most important biological questions is to understand the mechanisms underlying gene regulation. Genetic regulation is commonly investigated and modelled through the use of transcriptional regulatory network (TRN) structures. These model the regulatory interactions between two key components: transcription factors (TFs) and the target genes (TGs) they regulate. Transcriptional regulatory networks have proven to be invaluable scientific tools in Bioinformatics. When used in conjunction with comparative genomics, they have provided substantial insights into the evolution of regulatory interactions. Current approaches to regulatory network inference, however, omit two additional key entities: promoters and transcription factor binding sites (TFBSs). In this study, we attempted to explore the relationships among these regulatory components in bacteria. Our primary goal was to identify relationships that can assist in reducing the high false positive rates associated with transcription factor binding site predictions and thereupon enhance the reliability of the inferred transcription regulatory networks. In our preliminary exploration of relationships between the key regulatory components in Escherichia coli transcription, we discovered a number of potentially useful features. The combination of location score and sequence dissimilarity scores increased de novo binding site prediction accuracy by 13.6%. Another important observation made was with regards to the relationship between transcription factors grouped by their regulatory role and corresponding promoter strength. Our study of E.coli ��70 promoters, found support at the 0.1 significance level for our hypothesis | that weak promoters are preferentially associated with activator binding sites to enhance gene expression, whilst strong promoters have more repressor binding sites to repress or inhibit gene transcription. Although the observations were specific to �70, they nevertheless strongly encourage additional investigations when more experimentally confirmed data are available. In our preliminary exploration of relationships between the key regulatory components in E.coli transcription, we discovered a number of potentially useful features { some of which proved successful in reducing the number of false positives when applied to re-evaluate binding site predictions. Of chief interest was the relationship observed between promoter strength and TFs with respect to their regulatory role. Based on the common assumption, where promoter homology positively correlates with transcription rate, we hypothesised that weak promoters would have more transcription factors that enhance gene expression, whilst strong promoters would have more repressor binding sites. The t-tests assessed for E.coli �70 promoters returned a p-value of 0.072, which at 0.1 significance level suggested support for our (alternative) hypothesis; albeit this trend may only be present for promoters where corresponding TFBSs are either all repressors or all activators. Nevertheless, such suggestive results strongly encourage additional investigations when more experimentally confirmed data will become available. Much of the remainder of the thesis concerns a machine learning study of binding site prediction, using the SVM and kernel methods, principally the spectrum kernel. Spectrum kernels have been successfully applied in previous studies of protein classification [91, 92], as well as the related problem of promoter predictions [59], and we have here successfully applied the technique to refining TFBS predictions. The advantages provided by the SVM classifier were best seen in `moderately'-conserved transcription factor binding sites as represented by our E.coli CRP case study. Inclusion of additional position feature attributes further increased accuracy by 9.1% but more notable was the considerable decrease in false positive rate from 0.8 to 0.5 while retaining 0.9 sensitivity. Improved prediction of transcription factor binding sites is in turn extremely valuable in improving inference of regulatory relationships, a problem notoriously prone to false positive predictions. Here, the number of false regulatory interactions inferred using the conventional two-component model was substantially reduced when we integrated de novo transcription factor binding site predictions as an additional criterion for acceptance in a case study of inference in the Fur regulon. This initial work was extended to a comparative study of the iron regulatory system across 20 Yersinia strains. This work revealed interesting, strain-specific difierences, especially between pathogenic and non-pathogenic strains. Such difierences were made clear through interactive visualisations using the TRNDifi software developed as part of this work, and would have remained undetected using conventional methods. This approach led to the nomination of the Yfe iron-uptake system as a candidate for further wet-lab experimentation due to its potential active functionality in non-pathogens and its known participation in full virulence of the bubonic plague strain. Building on this work, we introduced novel structures we have labelled as `regulatory trees', inspired by the phylogenetic tree concept. Instead of using gene or protein sequence similarity, the regulatory trees were constructed based on the number of similar regulatory interactions. While the common phylogentic trees convey information regarding changes in gene repertoire, which we might regard being analogous to `hardware', the regulatory tree informs us of the changes in regulatory circuitry, in some respects analogous to `software'. In this context, we explored the `pan-regulatory network' for the Fur system, the entire set of regulatory interactions found for the Fur transcription factor across a group of genomes. In the pan-regulatory network, emphasis is placed on how the regulatory network for each target genome is inferred from multiple sources instead of a single source, as is the common approach. The benefit of using multiple reference networks, is a more comprehensive survey of the relationships, and increased confidence in the regulatory interactions predicted. In the present study, we distinguish between relationships found across the full set of genomes as the `core-regulatory-set', and interactions found only in a subset of genomes explored as the `sub-regulatory-set'. We found nine Fur target gene clusters present across the four genomes studied, this core set potentially identifying basic regulatory processes essential for survival. Species level difierences are seen at the sub-regulatory-set level; for example the known virulence factors, YbtA and PchR were found in Y.pestis and P.aerguinosa respectively, but were not present in both E.coli and B.subtilis. Such factors and the iron-uptake systems they regulate, are ideal candidates for wet-lab investigation to determine whether or not they are pathogenic specific. In this study, we employed a broad range of approaches to address our goals and assessed these methods using the Fur regulon as our initial case study. We identified a set of promising feature attributes; demonstrated their success in increasing transcription factor binding site prediction specificity while retaining sensitivity, and showed the importance of binding site predictions in enhancing the reliability of regulatory interaction inferences. Most importantly, these outcomes led to the introduction of a range of visualisations and techniques, which are applicable across the entire bacterial spectrum and can be utilised in studies beyond the understanding of transcriptional regulatory networks.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Over the past decade the mitochondrial (mt) genome has become the most widely used genomic resource available for systematic entomology. While the availability of other types of ‘–omics’ data – in particular transcriptomes – is increasing rapidly, mt genomes are still vastly cheaper to sequence and are far less demanding of high quality templates. Furthermore, almost all other ‘–omics’ approaches also sequence the mt genome, and so it can form a bridge between legacy and contemporary datasets. Mitochondrial genomes have now been sequenced for all insect orders, and in many instances representatives of each major lineage within orders (suborders, series or superfamilies depending on the group). They have also been applied to systematic questions at all taxonomic scales from resolving interordinal relationships (e.g. Cameron et al., 2009; Wan et al., 2012; Wang et al., 2012), through many intraordinal (e.g. Dowton et al., 2009; Timmermans et al., 2010; Zhao et al. 2013a) and family-level studies (e.g. Nelson et al., 2012; Zhao et al., 2013b) to population/biogeographic studies (e.g. Ma et al., 2012). Methodological issues around the use of mt genomes in insect phylogenetic analyses and the empirical results found to date have recently been reviewed by Cameron (2014), yet the technical aspects of sequencing and annotating mt genomes were not covered. Most papers which generate new mt genome report their methods in a simplified form which can be difficult to replicate without specific knowledge of the field. Published studies utilize a sufficiently wide range of approaches, usually without justification for the one chosen, that confusion about commonly used jargon such as ‘long PCR’ and ‘primer walking’ could be a serious barrier to entry. Furthermore, sequenced mt genomes have been annotated (gene locations defined) to wildly varying standards and improving data quality through consistent annotation procedures will benefit all downstream users of these datasets. The aims of this review are therefore to: 1. Describe in detail the various sequencing methods used on insect mt genomes; 2. Explore the strengths/weakness of different approaches; 3. Outline the procedures and software used for insect mt genome annotation, and; 4. Highlight quality control steps used for new annotations, and to improve the re-annotation of previously sequenced mt genomes used in systematic or comparative research.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background Chlamydia pecorum is an important pathogen of domesticated livestock including sheep, cattle and pigs. This pathogen is also a key factor in the decline of the koala in Australia. We sequenced the genomes of three koala C. pecorum strains, isolated from the urogenital tracts and conjunctiva of diseased koalas. The genome of the C. pecorum VR629 (IPA) strain, isolated from a sheep with polyarthritis, was also sequenced. Results Comparisons of the draft C. pecorum genomes against the complete genomes of livestock C. pecorum isolates revealed that these strains have a conserved gene content and order, sharing a nucleotide sequence similarity > 98%. Single nucleotide polymorphisms (SNPs) appear to be key factors in understanding the adaptive process. Two regions of the chromosome were found to be accumulating a large number of SNPs within the koala strains. These regions include the Chlamydia plasticity zone, which contains two cytotoxin genes (toxA and toxB), and a 77 kbp region that codes for putative type III effector proteins. In one koala strain (MC/MarsBar), the toxB gene was truncated by a premature stop codon but is full-length in IPTaLE and DBDeUG. Another five pseudogenes were also identified, two unique to the urogenital strains C. pecorum MC/MarsBar and C. pecorum DBDeUG, respectively, while three were unique to the koala C. pecorum conjunctival isolate IPTaLE. An examination of the distribution of these pseudogenes in C. pecorum strains from a variety of koala populations, alongside a number of sheep and cattle C. pecorum positive samples from Australian livestock, confirmed the presence of four predicted pseudogenes in koala C. pecorum clinical samples. Consistent with our genomics analyses, none of these pseudogenes were observed in the livestock C. pecorum samples examined. Interestingly, three SNPs resulting in pseudogenes identified in the IPTaLE isolate were not found in any other C. pecorum strain analysed, raising questions over the origin of these point mutations. Conclusions The genomic data revealed that variation between C. pecorum strains were mainly due to the accumulation of SNPs, some of which cause gene inactivation. The identification of these genetic differences will provide the basis for further studies to understand the biology and evolution of this important animal pathogen. Keywords: Chlamydia pecorum; Single nucleotide polymorphism; Pseudogene; Cytotoxin

Relevância:

100.00% 100.00%

Publicador:

Resumo:

In this study, we have identified the possible genetic factors responsible for fowl-adaptation of Salmonella enterica serovar Gallinarum (S. Gallinarum). By comparing the genes related to Salmonella pathogenicity islands (SPI) of S. Gallinarum with those of Salmonella enterica serovar Enteritidis (S. Enteritidis) we have identified twenty-four positively selected genes. Our results suggest that the genes encoding the structural components of SPI-2 encoded type three secretion apparatus (TTSS) and the effector proteins that are secreted via SPI-1 encoded TTSS have evolved under positive selection pressure in these serovars. We propose that these positively selected genes play important roles in conferring different host-specificities to S. Gallinarum and S. Enteritidis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Ferric uptake regulator (Fur) is a transcriptional regulator controlling the expression of genes involved in iron homeostasis and plays an important role in pathogenesis. Fur-regulated sRNAs/CDSs were found to have upstream Fur Binding Sites (FBS). We have constructed a Positional Weight Matrix from 100 known FBS (19 nt) and tracked the `Orphan' FBSs. Possible Fur regulated sRNAs and CDSs were identified by comparing their genomic locations with the `Orphan' FBSs identified. Thirty-eight `novel' and all known Fur regulated sRNAs in nine proteobacteria were identified. In addition, we identified high scoring FBSs in the promoter regions of the 304 CDSs and 68 of them were involved in siderophore biosynthesis, iron-transporters, two-component system, starch/sugar metabolism, sulphur/methane metabolism, etc. The present study shows that the Fur regulator controls the expression of genes involved in diverse metabolic activities and it is not limited to iron metabolism alone. (C) 2012 Elsevier B.V. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The chicken is the most extensively studied species in birds and thus constitutes an ideal reference for comparative genomics in birds. Comparative cytogenetic studies indicate that the chicken has retained many chromosome characters of the ancestral avia

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Chapter 2 of this thesis describes the sequence analysis of 14 bifidobacterial genomes from various species of the genus Bifidobacterium, and the determination of their open pan-genome trend. This analysis first determined the total number of genes to be considered as the reservoir of functions available to representatives of this genus. Many identified genes are still uncharacterized, but may be involved in the adaptation to the gut environment. This comparative genomic analysis also determined a pool of ortholog functions used to infer their phylogenetic relationship, thereby providing a more robust approach compared to that based solely on the16S rRNA-encoding gene. The genome sequence of an isolate from the insect hindgut Bifidobacterium asteroides PRL2011 was fully characterized in Chapter 3, surprisingly revealing a putative respiratory metabolism, which was also found to be present in other insect isolates, suggesting that respiration was an ancient feature of this genus, but also an adaptative trait to different atmosferic oxygen levels. Chapter 4 of this thesis outlines a comparative study which focused on the analysis of representatives of the Bifidobacterium breve species, revealing that the genetic variability among members of this species principally consists of genes with a role in adaptation to host environment and gut colonization. Finally, Chapter 5 describes the analysis of the genome sequence of Bifidobacterium animalis subsp. lactis BLC1, a probiotic bacterium widely used in food industries as an ingredient of functional foods, providing information that will allow future investigations of this species.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Helicobacter pylori is a gastric pathogen which infects ~50% of the global population and can lead to the development of gastritis, gastric and duodenal ulcers and carcinoma. Genome sequencing of H. pylori revealed high levels of genetic variability; this pathogen is known for its adaptability due to mechanisms including phase variation, recombination and horizontal gene transfer. Motility is essential for efficient colonisation by H. pylori. The flagellum is a complex nanomachine which has been studied in detail in E. coli and Salmonella. In H. pylori, key differences have been identified in the regulation of flagellum biogenesis, warranting further investigation. In this study, the genomes of two H. pylori strains (CCUG 17874 and P79) were sequenced and published as draft genome sequences. Comparative studies identified the potential role of restriction modification systems and the comB locus in transformation efficiency differences between these strains. Core genome analysis of 43 H. pylori strains including 17874 and P79 defined a more refined core genome for the species than previously published. Comparative analysis of the genome sequences of strains isolated from individuals suffering from H. pylori related diseases resulted in the identification of “disease-specific” genes. Structure-function analysis of the essential motility protein HP0958 was performed to elucidate its role during flagellum assembly in H. pylori. The previously reported HP0958-FliH interaction could not be substantiated in this study and appears to be a false positive. Site-directed mutagenesis confirmed that the coiled-coil domain of HP0958 is involved in the interaction with RpoN (74-284), while the Zn-finger domain is required for direct interaction with the full length flaA mRNA transcript. Complementation of a non-motile hp0958-null derivative strain of P79 with site-directed mutant alleles of hp0958 resulted in cells producing flagellar-type extrusions from non-polar positions. Thus, HP0958 may have a novel function in spatial localisation of flagella in H. pylori

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Bacterioplankton of the SAR11 clade are the most abundant microorganisms in marine systems, usually representing 25% or more of the total bacterial cells in seawater worldwide. SAR11 is divided into subclades with distinct spatiotemporal distributions (ecotypes), some of which appear to be specific to deep water. Here we examine the genomic basis for deep ocean distribution of one SAR11 bathytype (depth-specific ecotype), subclade Ic. Four single-cell Ic genomes, with estimated completeness of 55%-86%, were isolated from 770 m at station ALOHA and compared with eight SAR11 surface genomes and metagenomic datasets. Subclade Ic genomes dominated metagenomic fragment recruitment below the euphotic zone. They had similar COG distributions, high local synteny and shared a large number (69%) of orthologous clusters with SAR11 surface genomes, yet were distinct at the 16S rRNA gene and amino-acid level, and formed a separate, monophyletic group in phylogenetic trees. Subclade Ic genomes were enriched in genes associated with membrane/cell wall/envelope biosynthesis and showed evidence of unique phage defenses. The majority of subclade Ic-specfic genes were hypothetical, and some were highly abundant in deep ocean metagenomic data, potentially masking mechanisms for niche differentiation. However, the evidence suggests these organisms have a similar metabolism to their surface counterparts, and that subclade Ic adaptations to the deep ocean do not involve large variations in gene content, but rather more subtle differences previously observed deep ocean genomic data, like preferential amino-acid substitutions, larger coding regions among SAR11 clade orthologs, larger intergenic regions and larger estimated average genome size.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Most Bursaphelenchus species are fungal feeding nematodes that colonize dead or dying trees. However, Bursaphelenchus xylophilus , the pine wood nematode, is also a pathogen of trees and is the causal agent of pine wilt disease. B. xylophilus is native to North America and here it causes little damage to trees. Where it is introduced to new regions it causes huge damage. The most severely affected areas are found in the Far East but more recently B. xylophilus has been introduced into Portugal and the potential for damage here is also high. As incidence and severity of pine wilt disease are linked to temperature we suggest that climate change is likely to exacerbate the problems caused by B. xylophilus and, in addition, will extend (northwards in Europe) the range in which pine wilt disease can occur. Here we review what is currently known about the interactions of B. xylophilus with its hosts, including recent developments in our understanding of the molecular biology of pathogenicity in the nematode. We also examine the potential developments that could be made by more widespread use of genomics tools to understand interactions between B. xylophilus , bacterial pathogens that have been implicated in disease and host trees.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Microarray based comparative genomic hybridisation (CGH) experiments have been used to study numerous biological problems including understanding genome plasticity in pathogenic bacteria. Typically such experiments produce large data sets that are difficult for biologists to handle. Although there are some programmes available for interpretation of bacterial transcriptomics data and CGH microarray data for looking at genetic stability in oncogenes, there are none specifically to understand the mosaic nature of bacterial genomes. Consequently a bottle neck still persists in accurate processing and mathematical analysis of these data. To address this shortfall we have produced a simple and robust CGH microarray data analysis process that may be automated in the future to understand bacterial genomic diversity. Results: The process involves five steps: cleaning, normalisation, estimating gene presence and absence or divergence, validation, and analysis of data from test against three reference strains simultaneously. Each stage of the process is described and we have compared a number of methods available for characterising bacterial genomic diversity, for calculating the cut-off between gene presence and absence or divergence, and shown that a simple dynamic approach using a kernel density estimator performed better than both established, as well as a more sophisticated mixture modelling technique. We have also shown that current methods commonly used for CGH microarray analysis in tumour and cancer cell lines are not appropriate for analysing our data. Conclusion: After carrying out the analysis and validation for three sequenced Escherichia coli strains, CGH microarray data from 19 E. coli O157 pathogenic test strains were used to demonstrate the benefits of applying this simple and robust process to CGH microarray studies using bacterial genomes.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background. The anaerobic spirochaete Brachyspira pilosicoli causes enteric disease in avian, porcine and human hosts, amongst others. To date, the only available genome sequence of B. pilosicoli is that of strain 95/1000, a porcine isolate. In the first intra-species genome comparison within the Brachyspira genus, we report the whole genome sequence of B. pilosicoli B2904, an avian isolate, the incomplete genome sequence of B. pilosicoli WesB, a human isolate, and the comparisons with B. pilosicoli 95/1000. We also draw on incomplete genome sequences from three other Brachyspira species. Finally we report the first application of the high-throughput Biolog phenotype screening tool on the B. pilosicoli strains for detailed comparisons between genotype and phenotype. Results. Feature and sequence genome comparisons revealed a high degree of similarity between the three B. pilosicoli strains, although the genomes of B2904 and WesB were larger than that of 95/1000 (~2,765, 2.890 and 2.596 Mb, respectively). Genome rearrangements were observed which correlated largely with the positions of mobile genetic elements. Through comparison of the B2904 and WesB genomes with the 95/1000 genome, features that we propose are non-essential due to their absence from 95/1000 include a peptidase, glycine reductase complex components and transposases. Novel bacteriophages were detected in the newly-sequenced genomes, which appeared to have involvement in intra- and inter-species horizontal gene transfer. Phenotypic differences predicted from genome analysis, such as the lack of genes for glucuronate catabolism in 95/1000, were confirmed by phenotyping. Conclusions. The availability of multiple B. pilosicoli genome sequences has allowed us to demonstrate the substantial genomic variation that exists between these strains, and provides an insight into genetic events that are shaping the species. In addition, phenotype screening allowed determination of how genotypic differences translated to phenotype. Further application of such comparisons will improve understanding of the metabolic capabilities of Brachyspira species.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Root nodule symbiosis (RNS) is one of the most efficient biological systems for nitrogen fixation and it occurs in 90% of genera in the Papilionoideae, the largest subfamily of legumes. Most papilionoid species show evidence of a polyploidy event occurred approximately 58 million years ago. Although polyploidy is considered to be an important evolutionary force in plants, the role of this papilionoid polyploidy event, especially its association with RNS, is not understood. In this study, we explored this role using an integrated comparative genomic approach and conducted gene expression comparisons and gene ontology enrichment analyses. The results show the following: (1) approximately a quarter of the papilionoid-polyploidy-derived duplicate genes are retained; (2) there is a striking divergence in the level of expression of gene duplicate pairs derived from the polyploidy event; and (3) the retained duplicates are frequently involved in the processes crucial for RNS establishment, such as symbiotic signalling, nodule organogenesis, rhizobial infection and nutrient exchange and transport. Thus, we conclude that the papilionoid polyploidy event might have further refined RNS and induced a more robust and enhanced symbiotic system. This conclusion partly explains the widespread occurrence of the Papilionoideae.